Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization
نویسندگان
چکیده
منابع مشابه
Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization
The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effec...
متن کاملAn Improved Feature Selection Method in Chinese Text Categorization
This paper reports a method which has improved performance in feature selection in Chinese Text Categorization. The paper first uses the improved information gain (IG) to select the initial features, and experiments with two methods for feature selection. The first method chooses the top n features of all classes as vector space, while the second method chooses the top k features for each class...
متن کاملA Data-drive Feature Selection Method in Text Categorization
Text Categorization (TC) is the process of grouping texts into one or more predefined categories based on their content. It has become a key technique for handling and organizing text data. One of the most important issues in TC is Feature Selection (FS). Many FS methods have been put forward and widely used in TC field, such as Information Gain (IG), Document Frequency thresholding (DF) and Mu...
متن کاملCategorical Proportional Difference: A Feature Selection Method for Text Categorization
Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristics of the words contained in the document. Since the number of unique words in a learning task (i.e., the number of features) can be very large, the efficiency and accuracy of the learning task can be increased by using...
متن کاملA Knowledge-Based Feature Selection Method for Text Categorization
A major difficulty of text categorization is the high dimensionality of the original feature space. Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Scientific World Journal
سال: 2014
ISSN: 2356-6140,1537-744X
DOI: 10.1155/2014/625342